{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d9659e99-9f5e-4e40-8f9a-b119c1e6a377",
   "metadata": {
    "tags": []
   },
   "source": [
    "\n",
    "# K最近邻简介\n",
    "\n",
    "\n",
    "K最近邻属于一种估值或分类算法，他的解释很容易。\n",
    "我们假设一个人的优秀成为设定为1、2、3、4、5、6、7、8、9、10数值表示，其中10表示最优秀，1表示最不优秀。\n",
    "我们都知道近朱者赤，近墨者黑，我们想看一个人是什么样的，看他的朋友是什么样的就可以了。\n",
    "\n",
    "1、如何来考察一个人？\n",
    "我们通过特征属性来描述一个对象的特征，例如是否抽烟、是否运动、工资、年龄。\n",
    "\n",
    "2、怎样才算是跟朋友亲近？\n",
    "通过计算距离函数来表示两个对象的相似度。例如具有相同的习惯、兴趣\n",
    "\n",
    "3、不同亲近程度的朋友对考察人的影响？\n",
    "通过为不同距离设置权重，来表示不同近邻对待测对象的亲近度。例如5个亲近朋友中，最亲密的两个朋友对考察人物的影响非常大，其他三个朋友的影响几乎可以忽略。\n",
    "\n",
    "4、选多少个亲近朋友来考察比较合适呢？\n",
    "k的取值问题\n",
    "\n",
    "\t\n",
    "**K最近邻（KNN）概念：**\n",
    "\n",
    "它的工作原理是：存在一个样本数据集合，所有特征属性已知，并且样本集中每个对象都已知所属分类。对不知道分类的待测对象，将待测对象的每个特征属性与样本集中数据对应的特征属性进行比较，然后算法提取样本最相似对象(最近邻)的分类标签。一般来说，我们只选择样本数据集中前k个最相似的对象数据，这就是k-近邻算法中k的出处，通常k是不大于20的整数。最后根据这k个数据的特征和属性，判断待测数据的分类。\n",
    "\n",
    "# K最近的三个基本要素\n",
    "\n",
    "\n",
    "1、k值的选取。（在应用中，k值一般选择一个比较小的值，一般选用交叉验证来取最优的k值）\n",
    "\n",
    "2、距离度量。（$L_p$距离：误差绝对值p次方求和再求p次根。欧式距离：p=2的$L_p$距离。曼哈顿距离：p=1的$L_p$距离。p为无穷大时，$L_p$距离为各个维度上距离的最大值）\n",
    "\n",
    "3、分类决策规则。（也就是如何根据k个最近邻决定待测对象的分类。k最近邻的分类决策规则一般选用多数表决）\n",
    "\n",
    "\n",
    "**KNN基本执行步骤：**\n",
    "\n",
    "（1）计算待测对象和训练集中每个样本点的欧式距离\n",
    "（2）对上面的所有距离值排序\n",
    "（3）选出k个最小距离的样本作为“选民”\n",
    "（4）根据“选民”预测待测样本的分类或值\n",
    "\n",
    "**KNN特点**\n",
    "（1）原理简单\n",
    "（2）保存模型需要保存所有样本集\n",
    "（3）训练过程很快，预测速度很慢\n",
    "\n",
    "**优点**\n",
    "\n",
    "简单好用，容易理解，精度高，理论成熟，既可以用来做分类也可以用来做回归；\n",
    "可用于非线性分类；\n",
    "可用于数值型数据和离散型数据（既可以用来估值，又可以用来分类）\n",
    "训练时间复杂度为O(n)；无数据输入假定；\n",
    "对异常值不敏感。\n",
    "准确度高，对数据没有假设，对outlier不敏感；\n",
    "\n",
    "**缺点：**\n",
    "\n",
    "计算复杂性高；空间复杂性高；需要大量的内存\n",
    "样本不平衡问题（即有些类别的样本数量很多，而其它样本的数量很少）；\n",
    "一般数值很大的时候不用这个，计算量太大。但是单个样本又不能太少，否则容易发生误分。\n",
    "最大的缺点是无法给出数据的内在含义。\n",
    "\n",
    "**在上面的描述中思考以下问题：**\n",
    "\n",
    "> 样本属性如何选择？如何计算两个对象间距离？当样本各属性的类型和尺度不同时如何处理？各属性不同重要程度如何处理？模型的好坏如何评估？\n",
    "\n",
    "\n",
    "# 估值案例\n",
    "\n",
    "今天我们使用k最近邻算法来构建白酒的价格模型。这是一个估值模型，不是分类案例。需要预测的回归值为白酒的价格\n",
    "\n",
    "# 构造数据集\n",
    "\n",
    "\n",
    "我们先考虑一个简单的价格模型。我们知道白酒的价格跟等级、年代有很大的关系。我们需要知道一批白酒的数据作为样本数据集，其中包含每瓶白酒的等级、年代、和价格（输出值）。\n",
    "\n",
    "这批数据集你可以使用市场问卷调查，或者网络爬虫获取。为了简单我们自己模拟产生。\n",
    "\n",
    "构建一个葡萄酒样本数据集。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "9927b1fc-2d87-4bd2-974c-848801251397",
   "metadata": {},
   "outputs": [],
   "source": [
    "from random import random,randint\n",
    "import math\n",
    "\n",
    "# 根据等级和年代对价格进行模拟\n",
    "def wineprice(rating,age):\n",
    "    peak_age=rating-50\n",
    "\n",
    "    # 根据等级计算价格\n",
    "    price=rating/2\n",
    "    if age>peak_age:\n",
    "        # 经过“峰值年”，后续5年里其品质将会变差\n",
    "        price=price*(5-(age-peak_age)/2)\n",
    "    else:\n",
    "        # 价格在接近“峰值年”时会增加到原值的5倍\n",
    "        price=price*(5*((age+1)/peak_age))\n",
    "    if price<0: price=0\n",
    "    return price\n",
    "\n",
    "# 生成一批数据代表样本数据集\n",
    "def wineset1():\n",
    "    rows=[]\n",
    "    for i in range(300):\n",
    "        # 随机生成年代和等级\n",
    "        rating=random()*50+50\n",
    "        age=random()*50\n",
    "\n",
    "        # 得到一个参考价格\n",
    "        price=wineprice(rating,age)\n",
    "\n",
    "        # 添加一些噪音\n",
    "        price*=(random()*0.2+0.9)\n",
    "\n",
    "        # 加入数据集\n",
    "        rows.append({'input':(rating,age),'result':price})\n",
    "    return rows\n",
    "\n",
    "data = wineset1()      #创建第一批数据集"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60bcb7e2-91c3-49f0-976c-6f740dc06a96",
   "metadata": {},
   "source": [
    "\n",
    "# 样本间的距离\n",
    "\n",
    "要使用k最近邻，首先要知道哪些是最近邻，所以我们还要有一个功能，就是要能计算两个对象之间的相似度。我们这里使用欧几里得距离作为数据间的距离，代表相似度。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "553269e0-9526-4774-988f-8c36edac28b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 使用欧几里得距离，定义距离\n",
    "def euclidean(v1,v2):\n",
    "    d=0.0\n",
    "    for i in range(len(v1)):\n",
    "        d+=(v1[i]-v2[i])**2\n",
    "    return math.sqrt(d)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91b5ab77-75fe-48ae-959a-94f1fed564eb",
   "metadata": {},
   "source": [
    "# 获取与新数据距离最近的k个样本数据\n",
    "\n",
    "有了样本数据集，和计算两个样本间的距离。我们就可以对待测数据，计算待测数据与样本数据集中每个样本之间的距离。并对所有距离进行排序，获取距离最近的前k个样本。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "1bf83e1c-03b0-410d-b52b-28199d37ec4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 计算待测商品和样本数据集中任一商品间的距离。data样本数据集，vec1待测商品\n",
    "def getdistances(data,vec1):\n",
    "    distancelist=[]\n",
    "\n",
    "    # 遍历样本数据集中的每一项\n",
    "    for i in range(len(data)):\n",
    "        vec2=data[i]['input']\n",
    "\n",
    "        # 添加距离到距离列表\n",
    "        distancelist.append((euclidean(vec1,vec2),i))\n",
    "\n",
    "    # 距离排序\n",
    "    distancelist.sort()\n",
    "    return distancelist  #返回距离列表"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc586509-d2ba-45d0-b343-5963f80a8c15",
   "metadata": {},
   "source": [
    "# 根据距离最近的k个样本数据预测新对象的输出值\n",
    "\n",
    "上面的步骤，我们已经获取了距离最近的k个样本对象。那么如何根据k个样本对象的属性和价格计算待测对象的价格呢？\n",
    "\n",
    "**1、简单求均值**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "c6cb8152-e3d2-43cb-8099-174788575ad7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 对距离值最小的前k个结果求平均\n",
    "def knnestimate(data,vec1,k=5):\n",
    "    # 得到经过排序的距离值\n",
    "    dlist=getdistances(data,vec1)\n",
    "    avg=0.0\n",
    "\n",
    "    # 对前k项结果求平均\n",
    "    for i in range(k):\n",
    "        idx=dlist[i][1]\n",
    "        avg+=data[idx]['result']\n",
    "    avg=avg/k\n",
    "    return avg"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73fc5b6f-558a-491e-a59a-98d9817f870f",
   "metadata": {},
   "source": [
    "\n",
    "**2、求加权平均**\n",
    "\n",
    "如果使用直接求均值，有可能存在前k个最近邻中，可能会存在距离很远的数据，但是他仍然属于最近的前K个数据。\n",
    "\n",
    "比如设定了k=5，距离最近的3个样本对象距离待测对象很近，但是第4、5个样本对象已经非常远离了待测对象。\n",
    "\n",
    "当存在这种情况时，对前k个样本数据直接求均值会有偏差，所以使用加权平均，为较远的节点赋予较小的权值，对较近的节点赋予较大的权值。\n",
    "\n",
    "那么具体该怎么根据数据间距离分配权值呢？这里使用三种递减函数作为权值分配方法。\n",
    "\n",
    "三种权值分配的思想就是距离远的权值小，距离近的权值大。\n",
    "\n",
    "**2.1、使用反函数为近邻分配权重。**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "3d8d0d03-e2bc-421c-a52d-508da909911c",
   "metadata": {},
   "outputs": [],
   "source": [
    "def inverseweight(dist,num=1.0,const=0.1):\n",
    "    return num/(dist+const)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5acf59f9-fe71-40cf-b189-24ca4482a0df",
   "metadata": {},
   "source": [
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/9576904983d7a0ca86a7a00e53155922.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d747d263-641a-40e9-85d7-17d500d5525d",
   "metadata": {},
   "source": [
    "**2.2、使用减法函数为近邻分配权重。当最近距离都大于const时不可用。**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "b0fa6b4a-c228-4f14-9919-434c81c066de",
   "metadata": {},
   "outputs": [],
   "source": [
    "def subtractweight(dist,const=1.0):\n",
    "    if dist>const:\n",
    "        return 0\n",
    "    else:\n",
    "        return const-dist"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1bfdd0d-c7e2-4326-a431-2a76c3618644",
   "metadata": {},
   "source": [
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/bb16f674261a55d96aee39bdf1c62610.png)\n",
    "\n",
    "**2.3、使用高斯函数为距离分配权重。**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "993b12ab-37da-49cc-9bba-dfab70724416",
   "metadata": {},
   "outputs": [],
   "source": [
    "def gaussian(dist,sigma=5.0):\n",
    "    return math.e**(-dist**2/(2*sigma**2))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3033301b-8691-429e-ba82-e1166570a874",
   "metadata": {},
   "source": [
    "![这里写图片描述](https://img-blog.csdnimg.cn/img_convert/c9e41ca14feac83015eaa94490e9e73e.png)\n",
    "\n",
    "有了权值分配方法，下面就可以计算加权平均了。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "fbddd69b-5c64-4fc2-a7cb-ca3fc5b78404",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 对距离值最小的前k个结果求加权平均\n",
    "def weightedknn(data,vec1,k=5,weightf=gaussian):\n",
    "    # 得到距离值\n",
    "    dlist=getdistances(data,vec1)\n",
    "    avg=0.0\n",
    "    totalweight=0.0\n",
    "\n",
    "    # 得到加权平均\n",
    "    for i in range(k):\n",
    "        dist=dlist[i][0]\n",
    "        idx=dlist[i][1]\n",
    "        weight=weightf(dist)\n",
    "        avg+=weight*data[idx]['result']\n",
    "        totalweight+=weight\n",
    "    if totalweight==0: return 0\n",
    "    avg=avg/totalweight\n",
    "    return avg"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f0addc2-ecc3-4edc-a076-932851f8f767",
   "metadata": {},
   "source": [
    "----------\n",
    "\n",
    "以上内容，我们完成了对k最近邻的模型构造，并完成了对一个待测数据的预测。那么这个测试结果究竟好不好呢？又如何评估我们的建立的模型的好坏呢？下面我们来了解一下交叉验证。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b7e5168-e36e-4fde-936f-e3295e3761fb",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "# 交叉验证\n",
    "\n",
    "\n",
    "交叉验证是一种将样本集划分为训练集和待测集，使用样本集进行模型建立，然后使用模型对待测集进行预测，并查看预测结果与真实结果之间的误差，进而对模型进行评估的方法。每划分一次样本集就相当于建立了一个模型。待测集中每一个对象都可以进行一次测试。\n",
    "\n",
    "> 思考：交叉验证中如何将样本集划分训练集和待测集、测试多少次、有了预测结果如何计算模型好坏、能不能把预测结果返回到新的模型训练中？这些问题读者可以在今后学习中思考\n",
    "\n",
    "交叉验证用来验证你的算法或算法参数的好坏，比如上面的加权分配算法我们有三种方式，究竟哪个更好呢？我们可以使用交叉验证进行查看。\n",
    "\n",
    "随机选择样本数据集中95%作为训练集，5%作为待测集，对待测集中数据进行预测并与已知结果进行比较，计算准确率，查看算法效果。\n",
    "\n",
    "> 读者需要注意，交叉验证中并不是一定使用准确率来判断模型好坏，只不过准确率是用的最多的评估标准。\n",
    "\n",
    "下面使用代码实现交叉验证。\n",
    "\n",
    "首先实现将样本数据集划分为训练集和待测集两个子集的功能。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "54253a21-b8e4-498f-9a65-5cc865ed324a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 划分数据。test待测集占的比例。其他数据为训练集\n",
    "def dividedata(data,test=0.05):\n",
    "    trainset=[]\n",
    "    testset=[]\n",
    "    for row in data:\n",
    "        if random()<test:\n",
    "            testset.append(row)\n",
    "        else:\n",
    "            trainset.append(row)\n",
    "    return trainset,testset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af8bbe19-b7ec-4f80-84b1-927054df2ca4",
   "metadata": {},
   "source": [
    "然后再来计算预测结果与真实结果之间的准确率。以此来表示系统模型的好坏。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "6ea91dbd-48aa-4be4-9a9a-4ff56ea2f72b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 对使用算法进行预测的结果的误差进行统计，以此判断算法好坏。algf为算法函数，trainset为训练数据集，testset为待测数据集\n",
    "def testalgorithm(algf,trainset,testset):\n",
    "    error=0.0\n",
    "    for row in testset:\n",
    "        guess=algf(trainset,row['input'])   #这一步要和样本数据的格式保持一致，列表内个元素为一个字典，每个字典包含input和result两个属性。而且函数参数是列表和元组\n",
    "        error+=(row['result']-guess)**2\n",
    "        #print row['result'],guess\n",
    "    #print error/len(testset)\n",
    "    return error/len(testset)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f2cbcf6-d63a-4e48-a344-5483246ae98e",
   "metadata": {},
   "source": [
    "有了数据拆分和算法性能误差统计函数。我们就可以在原始数据集上进行多次这样的拆分统计实验，计算平均误差。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "52ce9b8c-abe1-4ff9-968b-f9f3f4b3fe97",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 将数据拆分和误差统计合并在一起。对数据集进行多次划分，并验证算法好坏\n",
    "def crossvalidate(algf,data,trials=100,test=0.1):\n",
    "    error=0.0\n",
    "    for i in range(trials):\n",
    "        trainset,testset=dividedata(data,test)\n",
    "        error+=testalgorithm(algf,trainset,testset)\n",
    "    return error/trials"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e738f91f-319c-4436-a3f9-8a06d91d9fcd",
   "metadata": {},
   "source": [
    "上面我们完成了交叉验证的全部功能，下面我们来尝试一下应用这个功能。\n",
    "\n",
    "交叉验证测试"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "3be697f4-2566-46d9-84fd-3367bb8b37fc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "平均误差：286.47110528978357\n",
      "平均误差：250.03608805220637\n",
      "平均误差：220.68960367069892\n"
     ]
    }
   ],
   "source": [
    "data = wineset1()      #创建第一批数据集\n",
    "# print(data)\n",
    "error = crossvalidate(knnestimate,data)   #对直接求均值算法进行评估\n",
    "print('平均误差：'+str(error))\n",
    "\n",
    "def knn3(d,v): return knnestimate(d,v,k=3)  #定义一个函数指针。参数为d列表，v元组\n",
    "error = crossvalidate(knn3, data)            #对直接求均值算法进行评估\n",
    "print('平均误差：' + str(error))\n",
    "\n",
    "def knninverse(d,v): return weightedknn(d,v,weightf=inverseweight)  #定义一个函数指针。参数为d列表，v元组\n",
    "error = crossvalidate(knninverse, data)            #对使用反函数做权值分配方法进行评估\n",
    "print('平均误差：' + str(error))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "815a5682-5f1e-40dc-828a-0d75901da814",
   "metadata": {},
   "source": [
    "\n",
    "----------\n",
    "\n",
    "\n",
    "上面我们已经完成了knn模型的建模和使用。以及使用交叉验证完成对模型中算法的选择和模型的评估。\n",
    "\n",
    "\n",
    "----------"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4abc45e0-e515-41da-99e7-0a29465b8888",
   "metadata": {},
   "source": [
    "下面我们来看一些复杂的情况。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df9aaee5-7ba9-452f-a94b-efb11847243e",
   "metadata": {},
   "source": [
    "\n",
    "# 不同类型、尺度的属性，无用属性\n",
    "\n",
    "\n",
    "实际生活中，白酒的价格不仅与年代和等级有关系，还与酒瓶容量等其他属性有很大的关系。\n",
    "所以我们这里来讨论一下当样本数据集中可能存在复杂属性的情况。\n",
    "\n",
    "1、属性类型不同，属性尺度不同的情况\n",
    "\n",
    "在样本数据集中的对象各个属性可能并不是取值范围相同、类型相同的数值，比如上面的酒的属性可能包含档次（0-100），酒的年限（0-50），酒的容量（三种容量375.0ml、750.0ml、1500.0ml），\n",
    "\n",
    "2、存在无效属性的情况\n",
    "\n",
    "在我们获取的样本数据中还有可能包含无用的属性，比如酒箱的号码（1-2000之间的整数）。在计算样本距离时，取值范围大的属性的变化会严重影响取值范围小的属性的变化，以至于结果会忽略取值范围小的属性。而且无用属性的变化也会增加数据之间的距离。比如两瓶非常相似的酒，两瓶酒之间的距离本应为接近0，但是由于两瓶酒的酒箱号码分别为1和100，则在计算两瓶酒之间的距离时就会接近100。而这属于一个很大的距离。\n",
    "\n",
    "\n",
    "针对上面的情况，要求我们能对样本数据的属性进行缩放到合适的范围，并要能删除无效属性。\n",
    "\n",
    "我们先来模拟构造一批数据集，这批数据集本应是网络爬虫获取或市场统计获取。为了简单我们使用代码模拟获取这么一批数据。它包含了酒的档次、年限、容量、酒箱的号码以及酒的价格。\n",
    "\n",
    "**构造新的数据集**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "b34e90bb-1464-4fc9-89ed-23bb75af00ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 构建新数据集，模拟不同类型、尺度的属性\n",
    "def wineset2():\n",
    "    rows=[]\n",
    "    for i in range(300):\n",
    "        rating=random()*50+50   #酒的档次\n",
    "        age=random()*50         #酒的年限\n",
    "        aisle=float(randint(1,2000))  #酒箱的号码（无关属性）\n",
    "        bottlesize=[375.0,750.0,1500.0][randint(0,2)]  #酒的容量\n",
    "        price=wineprice(rating,age)  #酒的价格\n",
    "        price*=(bottlesize/750)\n",
    "        price*=(random()*0.2+0.9)\n",
    "        rows.append({'input':(rating,age,aisle,bottlesize),'result':price})\n",
    "    return rows"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc8542a0-fc0c-4907-94b5-15c593a8458a",
   "metadata": {},
   "source": [
    "**实现按比例对属性的取值进行缩放的功能**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "ac8e9f31-2429-4176-a619-c4c9bda402b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 按比例对属性进行缩放，scale为各属性的值的缩放比例。\n",
    "def rescale(data,scale):\n",
    "    scaleddata=[]\n",
    "    for row in data:\n",
    "        scaled=[scale[i]*row['input'][i] for i in range(len(scale))]\n",
    "        scaleddata.append({'input':scaled,'result':row['result']})\n",
    "    return scaleddata"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "353295ea-4cca-4290-9024-6775a33d318f",
   "metadata": {},
   "source": [
    "\n",
    "那就还剩一个问题，究竟各个属性缩放多少才合适呢。这是一个优化问题，我们可以通过优化技术寻找最优化解。\n",
    "\n",
    "优化，就是寻找函数极值点的问题。寻找在一定自变量范围内使函数值最小或最大的自变量的取值。函数一般设定为成本函数，自变量为函数的参数，也就是算法的参数。参数和函数值可以是离散或连续值。\n",
    "\n",
    "\n",
    "比如属性比例缩放，每个属性的缩放比例就是自变量，成本函数就是预测结果的错误率，寻找方法就是简单的暴力尝试每一个自变量的值，寻找使成本函数最小的自变量。\n",
    "\n",
    "而需要优化的成本函数，就是通过缩放以后进行预测的结果与真实结果之间的误差值。误差值越小越好。误差值的计算同前面交叉验证时使用的相同crossvalidate函数\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "820da590-dc8e-4e06-99e2-37ab6095a2b6",
   "metadata": {},
   "source": [
    "下面构建成本函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "9f31fd67-dc5c-4df6-85b3-57490b0b197c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 生成成本函数。\n",
    "def createcostfunction(algf,data):\n",
    "    def costf(scale):\n",
    "        sdata=rescale(data,scale)\n",
    "        return crossvalidate(algf,sdata,trials=10)\n",
    "    return costf\n",
    "\n",
    "weightdomain=[(0,10)]*4     #将缩放比例这个题解的取值范围设置为0-10，可以自己设定，用于优化算法"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f220872-b9a5-4213-8a91-25667f12ae17",
   "metadata": {},
   "source": [
    "我们通过优化技术查找到了所有属性合适的缩放比例，我们也就可以对样本数据集的属性进行缩放后建立k最近邻模型。当有待测数据到来时，我们也对待测对象的属性进行缩放后，与模型中的对象（比例缩放后的样本对象）进行距离计算，获取k最近邻。\n",
    "\n",
    "测试代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "63481bd6-1a45-4ecf-aadc-e668d5b73ea4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Looking in indexes: https://mirrors.tencent.com/pypi/simple/, https://mirrors.tencent.com/repository/pypi/tencent_pypi/simple\n",
      "Requirement already satisfied: optimization in /usr/local/lib/python3.8/dist-packages (0.0.1)\n",
      "Requirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (from optimization) (1.3.5)\n",
      "Collecting argparse\n",
      "  Using cached https://mirrors.tencent.com/pypi/packages/f2/94/3af39d34be01a24a6e65433d19e107099374224905f1e0cc6bbe1fd22a2f/argparse-1.4.0-py2.py3-none-any.whl (23 kB)\n",
      "Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from optimization) (4.64.1)\n",
      "Requirement already satisfied: matplotlib in /usr/local/lib/python3.8/dist-packages (from optimization) (3.5.3)\n",
      "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->optimization) (1.4.4)\n",
      "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.8/dist-packages (from matplotlib->optimization) (4.37.1)\n",
      "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib->optimization) (0.11.0)\n",
      "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.8/dist-packages (from matplotlib->optimization) (9.2.0)\n",
      "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.8/dist-packages (from matplotlib->optimization) (21.3)\n",
      "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.8/dist-packages (from matplotlib->optimization) (1.22.1)\n",
      "Requirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->optimization) (3.0.6)\n",
      "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.8/dist-packages (from matplotlib->optimization) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas->optimization) (2021.3)\n",
      "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7->matplotlib->optimization) (1.16.0)\n",
      "Installing collected packages: argparse\n",
      "Successfully installed argparse-1.4.0\n",
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\n",
      "\u001b[33mWARNING: You are using pip version 21.3.1; however, version 22.2.2 is available.\n",
      "You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "! pip install optimization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "e1cfe1ea-c382-40fd-a31d-76e54732e57a",
   "metadata": {},
   "outputs": [
    {
     "ename": "ModuleNotFoundError",
     "evalue": "No module named 'optimization'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)",
      "Input \u001b[0;32mIn [44]\u001b[0m, in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      2\u001b[0m data \u001b[38;5;241m=\u001b[39m wineset2()  \u001b[38;5;66;03m# 创建第2批数据集\u001b[39;00m\n\u001b[1;32m      3\u001b[0m \u001b[38;5;66;03m# print(data)\u001b[39;00m\n\u001b[0;32m----> 4\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01moptimization\u001b[39;00m\n\u001b[1;32m      5\u001b[0m costf\u001b[38;5;241m=\u001b[39mcreatecostfunction(knnestimate,data)      \u001b[38;5;66;03m#创建成本函数\u001b[39;00m\n\u001b[1;32m      6\u001b[0m result \u001b[38;5;241m=\u001b[39m optimization\u001b[38;5;241m.\u001b[39mannealingoptimize(weightdomain,costf,step\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m2\u001b[39m)    \u001b[38;5;66;03m#使用退火算法寻找最优解\u001b[39;00m\n",
      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'optimization'"
     ]
    }
   ],
   "source": [
    "# ======================缩放比例优化======================\n",
    "data = wineset2()  # 创建第2批数据集\n",
    "# print(data)\n",
    "import optimization\n",
    "costf=createcostfunction(knnestimate,data)      #创建成本函数\n",
    "result = optimization.annealingoptimize(weightdomain,costf,step=2)    #使用退火算法寻找最优解\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89d4a944-5bd6-4c02-99c6-38d04cd24abb",
   "metadata": {},
   "source": [
    "\n",
    "上面我们又解决了一种带有不同类型、尺度的属性，无用属性等复杂属性的情况，这是数据来源的问题，下面我们将面临最后一个问题，是输出结果的问题。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01ff899c-7a79-4791-b6ea-9a7e9e566bba",
   "metadata": {},
   "source": [
    "\n",
    "预测结果的不对称分布\n",
    "-----\n",
    "\n",
    "我们上面建立的酒的价格模型。这是正规市场中的酒。但是如何这酒是逃税酒，那么逃税酒的价格仍然和酒的等级、年份、容量有相似的关系，不过比正规酒要便宜40%。而我们买一瓶酒时并不知道这是正规酒就还是逃税酒。因此当我们预测价格时我们更情愿说如果这酒是正规酒，则价格大约是m1元，如果是逃税酒，这价格是m2元。\n",
    "\n",
    "当然还有另外一种结果输出的形式，就是n1%的概率是m1元，n2%的概率是m2元，这种概率输出方式。\n",
    "\n",
    "在数据挖掘中，对于样本数据集包含多种分布情况时，待测对象的预测结果我们也希望不唯一的表示。我们可以使用概率结果进行表示，输出每种结果的值和出现的概率。\n",
    "\n",
    "我们模拟一批数据，样本数据中价格存在正规酒和逃税酒这两种形式的分布。\n",
    "\n",
    "**构造数据集**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "067b858c-f2ae-4c78-b3da-837580c716a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "def wineset3():\n",
    "    rows=wineset1()\n",
    "    for row in rows:\n",
    "        if random()<0.5:\n",
    "            # 一半的可能是逃税酒\n",
    "            row['result']*=0.6\n",
    "    return rows"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d702090c-53e0-4e86-b25d-ffbb6f952b27",
   "metadata": {},
   "source": [
    "如果以结果概率的形式存在，首先我们要能够计算概率值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "9be52ef9-8d0d-49b3-9141-45cfe0285ee5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 计算概率。data样本数据集，vec1预测数据，low，high结果范围，weightf为根据距离进行权值分配的函数\n",
    "def probguess(data,vec1,low,high,k=5,weightf=gaussian):\n",
    "    dlist=getdistances(data,vec1)  #获取距离列表\n",
    "    nweight=0.0\n",
    "    tweight=0.0\n",
    "\n",
    "    for i in range(k):\n",
    "        dist=dlist[i][0]   #距离\n",
    "        idx=dlist[i][1]   #酒箱的号码\n",
    "        weight=weightf(dist)  #权值\n",
    "        v=data[idx]['result']  #真实结果\n",
    "\n",
    "        # 当前数据点位于指定范围内么？\n",
    "        if v>=low and v<=high:\n",
    "            nweight+=weight    #指定范围的权值之和\n",
    "        tweight+=weight        #总的权值之和\n",
    "    if tweight==0: return 0\n",
    "\n",
    "    # 概率等于位于指定范围内的权重值除以所有权重值\n",
    "    return nweight/tweight"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80a56763-9480-4345-a936-b76e44cab14d",
   "metadata": {},
   "source": [
    "对于多种输出、以概率和值的形式表示的结果，我们可以使用累积概率分布图或概率密度图的形式表现。\n",
    "\n",
    "**绘制累积概率分布图**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "f9f9863d-003f-411b-b40d-91d1963d72b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pylab import *\n",
    "\n",
    "# 绘制累积概率分布图。data样本数据集，vec1预测数据，high取值最高点，k近邻范围，weightf权值分配\n",
    "def cumulativegraph(data,vec1,high,k=5,weightf=gaussian):\n",
    "    t1=arange(0.0,high,0.1)\n",
    "    cprob=array([probguess(data,vec1,0,v,k,weightf) for v in t1])   #预测产生的不同结果的概率\n",
    "    plot(t1,cprob)\n",
    "    show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9bd620a8-6c68-4a2f-8a17-64048c0430ef",
   "metadata": {},
   "source": [
    "**绘制概率密度图**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "39c5e3d4-e901-400f-8dd6-8cb02b7c9cb0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 绘制概率密度图\n",
    "def probabilitygraph(data,vec1,high,k=5,weightf=gaussian,ss=5.0):\n",
    "    # 建立一个代表价格的值域范围\n",
    "    t1=arange(0.0,high,0.1)\n",
    "\n",
    "    # 得到整个值域范围内的所有概率\n",
    "    probs=[probguess(data,vec1,v,v+0.1,k,weightf) for v in t1]\n",
    "\n",
    "    # 通过加上近邻概率的高斯计算结果，对概率值做平滑处理\n",
    "    smoothed=[]\n",
    "    for i in range(len(probs)):\n",
    "        sv=0.0\n",
    "        for j in range(0,len(probs)):\n",
    "            dist=abs(i-j)*0.1\n",
    "            weight=gaussian(dist,sigma=ss)\n",
    "            sv+=weight*probs[j]\n",
    "        smoothed.append(sv)\n",
    "    smoothed=array(smoothed)\n",
    "\n",
    "    plot(t1,smoothed)\n",
    "    show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "f4f7dc0d-6eb4-43f2-b40d-d21e017908da",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/NK7nSAAAACXBIWXMAAAsTAAALEwEAmpwYAAAYc0lEQVR4nO3df5BdZX3H8feHxYA/YSFbqtmQRLuoUSzoNtLSFkYlLNQx/mg7Aa2xdcx0RqzFH50wOmDj2NqprdZORKOmqKOkNFq706aNkR91RsHupiCSYGAJ1eyKshrA+qMk2f32j/MsHC572bPZc++59+znNbOz9zznnLvP4YTPPvs8zz2PIgIzM6uv46qugJmZtZaD3sys5hz0ZmY156A3M6s5B72ZWc0dX3UFGi1dujRWrlxZdTXMzLrKnj17fhQRfbPt67igX7lyJaOjo1VXw8ysq0j6brN97roxM6s5B72ZWc056M3Mas5Bb2ZWcw56M7OamzPoJW2TdL+kO5rsl6SPShqTdLukF+f2bZB0d/raUGbFzcysmCIt+muAoSfYfxEwkL42AlcDSDoFuAp4KbAGuEpS70Iqa2Zm8zfnPPqI+JqklU9wyDrgs5E97/gWSSdLeiZwPrA7Ig4BSNpN9gvj2gXX2sxq69DPDvP5W77LkanpqqvSdr980pO59KWnl/6+ZXxgahlwMLc9nsqalT+OpI1kfw1w+unlX6SZdY9de3/A3+y+CwCp4sq02VnLT+7YoF+wiNgKbAUYHBz0Sihmi9jR1JIffe8rWPq0EyquTT2UMetmAlie2+5PZc3Kzcyamk5NveMWW3O+hcoI+mHgjWn2zTnAQxFxH7ALWCupNw3Crk1lZmZNTaflTY9zzpdmzq4bSdeSDawulTRONpPmSQAR8XFgJ3AxMAb8HPjDtO+QpPcDI+mtNs8MzJqZNTPTopdb9KUpMuvmkjn2B/DWJvu2AduOrWpmthiFW/Sl8ydjzayjPNp146Qvi4PezDqKB2PL56A3s44y06J3zpfHQW9mHSXcoi+dg97MOsr0tAdjy+agN7OO4j768jnozayjuI++fA56M+soEYHkD0yVyUFvZh1lOtxtUzYHvZl1lOkID8SWzEFvZh1lOtxtUzYHvZl1lHCLvnQOejPrKFnXjZO+TA56M+soHowtn4PezDrKdJpeaeUpFPSShiTtlzQmadMs+1dIul7S7ZJuktSf2zcl6bb0NVxm5c2sfsIt+tIVWWGqB9gCXACMAyOShiNiX+6wDwGfjYjPSHoZ8JfAH6R9v4iIs8qttpnVladXlq9Ii34NMBYRByLiMLAdWNdwzGrghvT6xln2m5kV4sHY8hUJ+mXAwdz2eCrL+xbw2vT6NcDTJZ2atk+UNCrpFkmvXkhlzaz+PI++fGUNxr4LOE/SrcB5wAQwlfatiIhB4FLgI5Ke03iypI3pl8Ho5ORkSVUys27kefTlKxL0E8Dy3HZ/KntERHw/Il4bEWcD70llD6bvE+n7AeAm4OzGHxARWyNiMCIG+/r6juEyzKwupqc9GFu2IkE/AgxIWiVpCbAeeMzsGUlLJc281xXAtlTeK+mEmWOAc4H8IK6Z2WN4MLZ8cwZ9RBwFLgN2AXcC10XEXkmbJb0qHXY+sF/SXcBpwAdS+fOBUUnfIhuk/WDDbB0zs8dwH3355pxeCRARO4GdDWVX5l7vAHbMct43gDMXWEczW0QiguP8Uc5S+T+nmXUUT68sn4PezDqKn3VTvkJdN2a2OD18dIrXXf0NfvDQw237mT/5vyP09z65bT9vMXDQm1lTD/38CHdM/IRfW9nLwGlPb9vPPefZp859kBXmoDezpqYj+/6as/u59KWnV1sZO2buozezpoIs6T2vvbs56M2sqZkWvQdHu5uD3syamk5J75zvbg56M2sq3KKvBQe9mTU1HW7R14GD3syamgl6t+i7m4PezJqaGYx1znc3B72ZPQG36OvAQW9mTXl6ZT046M2sqUf76CuuiC2Ig97Mmpqezr57IZDuVijoJQ1J2i9pTNKmWfavkHS9pNsl3SSpP7dvg6S709eGMitvZq3lFn09zBn0knqALcBFwGrgEkmrGw77EPDZiHgRsBn4y3TuKcBVwEuBNcBVknrLq76ZtVI8MuvGSd/NirTo1wBjEXEgIg4D24F1DcesBm5Ir2/M7b8Q2B0RhyLiAWA3MLTwaptZO/ihZvVQJOiXAQdz2+OpLO9bwGvT69cAT5d0asFzkbRR0qik0cnJyaJ1N7MW86ybeihrMPZdwHmSbgXOAyaAqaInR8TWiBiMiMG+vr6SqmRmC+VHINRDkYVHJoDlue3+VPaIiPg+qUUv6WnA6yLiQUkTwPkN5960gPqaWRuFH4FQC0Va9CPAgKRVkpYA64Hh/AGSlkqaea8rgG3p9S5graTeNAi7NpWZWRdw1009zBn0EXEUuIwsoO8ErouIvZI2S3pVOux8YL+ku4DTgA+kcw8B7yf7ZTECbE5lZtYFZp5H78HY7lZozdiI2AnsbCi7Mvd6B7CjybnbeLSFb2ZdJDXoPb2yy/mTsWbWlAdj68FBb2ZNeYWpenDQm1lTfgRCPTjozaypaT8CoRYc9GbWlFv09eCgN7Pm3EdfCw56M2vKi4PXg4PezJry4uD14KA3s6Y8j74eHPRm1pQfalYPDnoza8oPNasHB72ZNeXplfVQ6KFmZrZw09PBaz72dQ786GdVV6WwI1PTABznpO9qDnqzNjk8Nc23xh/iJSt6eVH/SVVXp7Depyxh1alPrboatgAOerM2mUod3hesPo0/Pu85FdfGFpNCffSShiTtlzQmadMs+0+XdKOkWyXdLuniVL5S0i8k3Za+Pl72BZh1i6nU3328u0GszeZs0UvqAbYAFwDjwIik4YjYlzvsvWQrT10taTXZIiUr0757IuKsUmtt1oUeXa3JQW/tVaRFvwYYi4gDEXEY2A6sazgmgGek1ycB3y+vimb1cDQF/fE9DnprryJBvww4mNseT2V57wPeIGmcrDX/tty+ValL5z8l/dZsP0DSRkmjkkYnJyeL196si7hFb1Upax79JcA1EdEPXAx8TtJxwH3A6RFxNvAO4AuSntF4ckRsjYjBiBjs6+srqUpmncV99FaVIkE/ASzPbfensrw3A9cBRMTNwInA0oh4OCJ+nMr3APcAZyy00mbd6OhUatE76K3NigT9CDAgaZWkJcB6YLjhmO8BLweQ9HyyoJ+U1JcGc5H0bGAAOFBW5c26ycynTHvcdWNtNuesm4g4KukyYBfQA2yLiL2SNgOjETEMvBP4pKTLyQZm3xQRIem3gc2SjgDTwB9HxKGWXY1ZB5vyYKxVpNAHpiJiJ9kga77sytzrfcC5s5z3ReCLC6yjWS1MeTDWKuKHmpm1iQdjrSoOerM28WCsVcVBb9Ym027RW0Uc9GZt8kgfvYPe2sxBb9YmM0Hv6ZXWbg56szZ5ZHqlW/TWZg56szaZmXXjrhtrNy88YtZif3/93Xxl3w/56cNHAehx0FubOejNWuzfvn0fP/rpw7yo/2TOXHYSz/vlp1ddJVtkHPRmbfDi03vZ+sbBqqthi5T76M3Mas5Bb9ZiEeAZlVYlB72ZWc056M1aLAiEm/RWHQe9WYu568aq5qA3M6u5QkEvaUjSfkljkjbNsv90STdKulXS7ZIuzu27Ip23X9KFZVberBsEbtFbteacR5/WfN0CXACMAyOShtOqUjPeC1wXEVdLWk22GtXK9Ho98ALgWcBXJZ0REVNlX4iZmc2uSIt+DTAWEQci4jCwHVjXcEwAz0ivTwK+n16vA7ZHxMMRcS8wlt7PbNGI8GCsVatI0C8DDua2x1NZ3vuAN0gaJ2vNv20e5yJpo6RRSaOTk5MFq27WRZzzVqGyBmMvAa6JiH7gYuBzkgq/d0RsjYjBiBjs6+srqUpmnSGqroAtekWedTMBLM9t96eyvDcDQwARcbOkE4GlBc81q7dwg96qVaTVPQIMSFolaQnZ4OpwwzHfA14OIOn5wInAZDpuvaQTJK0CBoD/KqvyZmY2tzlb9BFxVNJlwC6gB9gWEXslbQZGI2IYeCfwSUmXk/2l+qaICGCvpOuAfcBR4K2ecWOLTTa90m16q06hxxRHxE6yQdZ82ZW51/uAc5uc+wHgAwuoo1nXc8xblfzJWLMWi/BwrFXLQW/WYv5krFXNQW9mVnMOerMWC0+vtIo56M3Mas5Bb9ZiQXh6pVXKQW/WBo55q5KD3qzFPLvSquagN2uxCNykt0o56M3Mas5Bb9YGXnjEquSgN2sDT7qxKjnozVrMz7qxqjnozVrMY7FWNQe9mVnNFQp6SUOS9ksak7Rplv0flnRb+rpL0oO5fVO5fY0rU5nVXoT76K1acy48IqkH2AJcAIwDI5KG02IjAETE5bnj3wacnXuLX0TEWaXV2KwLedaNValIi34NMBYRByLiMLAdWPcEx18CXFtG5czqIPBgrFWrSNAvAw7mtsdT2eNIWgGsAm7IFZ8oaVTSLZJe3eS8jemY0cnJyWI1N+si7rqxKpU9GLse2NGwAPiKiBgELgU+Iuk5jSdFxNaIGIyIwb6+vpKrZFYtz660qhUJ+glgeW67P5XNZj0N3TYRMZG+HwBu4rH992a156UErWpFgn4EGJC0StISsjB/3OwZSc8DeoGbc2W9kk5Ir5cC5wL7Gs81M7PWmXPWTUQclXQZsAvoAbZFxF5Jm4HRiJgJ/fXA9njsxwCfD3xC0jTZL5UP5mfrmC0G2f8RbtJbdeYMeoCI2AnsbCi7smH7fbOc9w3gzAXUz6wW3HVjVfInY81azqOxVi0HvVkbuEFvVXLQm7WYp1da1Rz0Zi3m6ZVWNQe9WRv4WTdWJQe9WYt54RGrmoPerA3cdWNVctCbtZjb81Y1B71Zi0V4eqVVy0Fv1gZy341VyEFv1mIejLWqOejNzGrOQW/WYm7PW9Uc9GZt4C56q5KD3qzVwp+MtWoVCnpJQ5L2SxqTtGmW/R+WdFv6ukvSg7l9GyTdnb42lFh3s67grhur2pwLj0jqAbYAFwDjwIik4fxKURFxee74t5HWhZV0CnAVMEj2731POveBUq/CrMO568aqVKRFvwYYi4gDEXEY2A6se4LjL+HRBcIvBHZHxKEU7ruBoYVU2KzbeHqlVa1I0C8DDua2x1PZ40haAawCbpjPuZI2ShqVNDo5OVmk3mZdxQ16q1LZg7HrgR0RMTWfkyJia0QMRsRgX19fyVUyq5bb81a1IkE/ASzPbfenstms59Fum/mea1ZLEe6jt2oVCfoRYEDSKklLyMJ8uPEgSc8DeoGbc8W7gLWSeiX1AmtTmdmi4mfdWJXmnHUTEUclXUYW0D3AtojYK2kzMBoRM6G/HtgeuZGniDgk6f1kvywANkfEoXIvwayzhTtvrGJzBj1AROwEdjaUXdmw/b4m524Dth1j/cxqwe15q5I/GWvWYp5daVVz0Ju1g5v0ViEHvVmLBX7WjVXLQW/Wau66sYo56M3awLMrrUoOerMW8/RKq5qD3qwN3KC3KjnozVrMj0CwqjnozcxqzkFv1mKeXmlVc9CbtZgXHrGqOejN2sB99FYlB71Zi7k9b1Vz0Ju1gRv0ViUHvVmLReC+G6uUg97MrOYKBb2kIUn7JY1J2tTkmN+XtE/SXklfyJVPSbotfT1uCUKzxcDteavSnCtMSeoBtgAXAOPAiKThiNiXO2YAuAI4NyIekPRLubf4RUScVW61zbqDp1ZaJyjSol8DjEXEgYg4DGwH1jUc8xZgS0Q8ABAR95dbTbPu5i56q1KRoF8GHMxtj6eyvDOAMyR9XdItkoZy+06UNJrKXz3bD5C0MR0zOjk5OZ/6m3W0mQa9PxlrVSq0OHjB9xkAzgf6ga9JOjMiHgRWRMSEpGcDN0j6dkTckz85IrYCWwEGBwf9t66ZWYmKtOgngOW57f5UljcODEfEkYi4F7iLLPiJiIn0/QBwE3D2Auts1jVmWi3uurEqFQn6EWBA0ipJS4D1QOPsmS+TteaRtJSsK+eApF5JJ+TKzwX2YbZIeDDWOsGcXTcRcVTSZcAuoAfYFhF7JW0GRiNiOO1bK2kfMAW8OyJ+LOk3gE9Imib7pfLB/Gwds8XCDXqrUqE++ojYCexsKLsy9zqAd6Sv/DHfAM5ceDXNupO7bqwT+JOxZmY156A3a6FHple6SW8VctCbtVD4IcXWARz0ZmY156A3a6FHu26qrYctbg56M7Oac9CbtYGfdWNVctCbmdWcg96shdxHb53AQW/WQp5eaZ3AQW/WBm7QW5Uc9GYt5K4b6wQOejOzmnPQm7XQI0+vdOeNVchBb2ZWc4WCXtKQpP2SxiRtanLM70vaJ2mvpC/kyjdIujt9bSir4mbdYGaFKffRW5XmXHhEUg+wBbiAbG3YEUnD+ZWiJA0AVwDnRsQDkn4plZ8CXAUMkv0Vuyed+0D5l2LWeTy50jpBkRb9GmAsIg5ExGFgO7Cu4Zi3AFtmAjwi7k/lFwK7I+JQ2rcbGCqn6mZmVkSRoF8GHMxtj6eyvDOAMyR9XdItkobmcS6SNkoalTQ6OTlZvPZmHc4Lj1gnKGsw9nhgADgfuAT4pKSTi54cEVsjYjAiBvv6+kqqkpmZQbHFwSeA5bnt/lSWNw58MyKOAPdKuoss+CfIwj9/7k3HWtkn8uDPD/N7H7+5FW9tdsymZgZjK66HLW5Fgn4EGJC0iiy41wOXNhzzZbKW/D9IWkrWlXMAuAf4C0m96bi1ZIO2pTvuODFw2tNa8dZmC/LCZ53E+c/1X6pWnTmDPiKOSroM2AX0ANsiYq+kzcBoRAynfWsl7QOmgHdHxI8BJL2f7JcFwOaIONSKC3nGiU/iY69/SSve2sysq2lmnm+nGBwcjNHR0aqrYWbWVSTtiYjB2fb5k7FmZjXnoDczqzkHvZlZzTnozcxqzkFvZlZzDnozs5pz0JuZ1VzHzaOXNAl8dwFvsRT4UUnVqVJdrgN8LZ3K19J5FnIdKyJi1o9gd1zQL5Sk0WYfGugmdbkO8LV0Kl9L52nVdbjrxsys5hz0ZmY1V8eg31p1BUpSl+sAX0un8rV0npZcR+366M3M7LHq2KI3M7McB72ZWc3VJuglDUnaL2lM0qaq6zMfkpZLulHSPkl7Jb09lZ8iabeku9P33rneqxNI6pF0q6R/TdurJH0z3Zt/lLSk6joWIelkSTskfUfSnZJ+vYvvyeXp39Ydkq6VdGK33BdJ2yTdL+mOXNms90GZj6Zrul3Si6ur+eM1uZa/Tv/Gbpf0z/n1tiVdka5lv6QLj/Xn1iLoJfUAW4CLgNXAJZJWV1ureTkKvDMiVgPnAG9N9d8EXB8RA8D1absbvB24M7f9V8CHI+JXgAeAN1dSq/n7O+A/IuJ5wK+SXVPX3RNJy4A/AQYj4oVkK8Wtp3vuyzXAUENZs/twEdl61QPARuDqNtWxqGt4/LXsBl4YES8C7iItt5oyYD3wgnTOx1LWzVstgh5YA4xFxIGIOAxsB9ZVXKfCIuK+iPjv9Pp/yQJlGdk1fCYd9hng1ZVUcB4k9QO/A3wqbQt4GbAjHdIt13ES8NvApwEi4nBEPEgX3pPkeODJko4HngLcR5fcl4j4GtC4BGmz+7AO+GxkbgFOlvTMtlS0gNmuJSK+EhFH0+YtQH96vQ7YHhEPR8S9wBhZ1s1bXYJ+GXAwtz2eyrqOpJXA2cA3gdMi4r606wfAaVXVax4+AvwZMJ22TwUezP1D7pZ7swqYJFvw/lZJn5L0VLrwnkTEBPAh4HtkAf8QsIfuvC8zmt2Hbs+CPwL+Pb0u7VrqEvS1IOlpwBeBP42In+T3RTYPtqPnwkp6JXB/ROypui4lOB54MXB1RJwN/IyGbppuuCcAqf96Hdkvr2cBT+Xx3Qddq1vuw1wkvYesG/fzZb93XYJ+Alie2+5PZV1D0pPIQv7zEfGlVPzDmT870/f7q6pfQecCr5L0P2TdZy8j6+c+OXUZQPfcm3FgPCK+mbZ3kAV/t90TgFcA90bEZEQcAb5Edq+68b7MaHYfujILJL0JeCXw+nj0w02lXUtdgn4EGEizCJaQDWAMV1ynwlI/9qeBOyPib3O7hoEN6fUG4F/aXbf5iIgrIqI/IlaS3YMbIuL1wI3A76bDOv46ACLiB8BBSc9NRS8H9tFl9yT5HnCOpKekf2sz19J19yWn2X0YBt6YZt+cAzyU6+LpSJKGyLo7XxURP8/tGgbWSzpB0iqyAeb/OqYfEhG1+AIuJhuxvgd4T9X1mWfdf5PsT8/bgdvS18Vk/dvXA3cDXwVOqbqu87im84F/Ta+fnf6BjgH/BJxQdf0KXsNZwGi6L18Gerv1ngB/DnwHuAP4HHBCt9wX4FqysYUjZH9pvbnZfQBENgPvHuDbZDONKr+GOa5ljKwvfub//Y/njn9Pupb9wEXH+nP9CAQzs5qrS9eNmZk14aA3M6s5B72ZWc056M3Mas5Bb2ZWcw56M7Oac9CbmdXc/wPAf2na8BJWjAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/NK7nSAAAACXBIWXMAAAsTAAALEwEAmpwYAAAlYUlEQVR4nO3deXRV9b3+8fcn80SAQBiEQADjgIAMR0a1WhWxTigOoFaxVUTFsdde++tdba/a0dahiiDiVBVRwLZYrYiKI6AkiMxDQJEwhnkmJPn8/sjRG22UQA7sc06e11pnkT0dnt0un2z28N3m7oiISPxKCDqAiIgcXip6EZE4p6IXEYlzKnoRkTinohcRiXNJQQf4tqZNm3p+fn7QMUREYkpRUdFGd8+taVnUFX1+fj6FhYVBxxARiSlmtvK7lunUjYhInKtV0ZvZADNbYmbFZnb3d6xzmZktNLMFZjau2vxrzGxZ+HNNpIKLiEjtHPDUjZklAiOBs4ASYJaZTXb3hdXWKQB+AfRz9y1m1iw8Pwf4NRACHCgKb7sl8rsiIiI1qc0RfU+g2N1XuHsZMB648FvrXA+M/KrA3X1DeP7ZwFR33xxeNhUYEJnoIiJSG7Up+lbAqmrTJeF51R0DHGNmH5nZTDMbcBDbiojIYRSpu26SgALgNKA18L6Zda7txmY2DBgG0KZNmwhFEhERqN0R/Wogr9p06/C86kqAye6+390/B5ZSVfy12RZ3H+PuIXcP5ebWeBuoiIgcotoc0c8CCsysHVUlPRi44lvr/AMYAjxtZk2pOpWzAlgO/M7MGofX60/VRduI211Wzqh3l5OYYCQlGIkJCeE/jbTkRDJSqj6ZqUmkpySSnZZM44xkGmWkkJhghyOSiEhUOGDRu3u5mY0ApgCJwFPuvsDM7gEK3X1yeFl/M1sIVAB3ufsmADO7l6pfFgD3uPvmw7Eju/ZVMHJaMZUHOby+GTRMT6ZxRgpNMlNo3jCNFtlptGyYRvPsNI5qlEZeTga5WamY6ReCiMQei7YXj4RCIa/Lk7GVlU6FOxWVVZ/yCmdveQW79pWzu6yC3WVVP2/fu58tu8rYvHs/W3eXsXlXGRt37mP99n2s27aXPfsrvvG9GSmJtMnJoE1OBvlNMzm6WRYFzbIoaN6ArNSoe8BYROoZMyty91BNy+KuoRISjASM5MT/m9eQ5IP6Dndn+95y1m3by5qte/hy825WbtrNl5t38fnGXby7tJSy8sqv12/VKJ2C5ll0bJlN51YN6dSqIa0bp+tfACISFeKu6CPBzGiYnkzD9GSObdHgP5ZXVDqrNu9m6fodLNuwk6Xrd7Bk3Q4+XLaR8vC5o4bpyXRqlU3XvEb0aNuY7m0a0ygj5UjvioiIiv5QJCYY+U0zyW+aSf8T/m/+3v0VLF2/g3mrtzF/9Xbmr97G4++t+Lr8O+Rm0qNtY07Kz6FPhya0bpwR0B6ISH2ioo+gtOREurRuRJfWjb6et6esgrklWyn6cguzV25h6sL1vFxYAkBeTjp92zelT4cm9D26Cc0apAWUXETimYr+MEtPSaRX+yb0at8EqDr/v3T9TqYv38iM5Zv49/y1vFRY9fBwx5bZnHZsLqcd24zubRqRlKjBRUWk7uLurptYU1HpLFyznfeXlfLeklKKvtxCRaXTIC2JUwty6X9Cc04/rhnZaQd3QVlE6pfvu+tGRR9ltu3Zz0fFG3l3yQbeWVzKxp37SE40erdvQv8TWtC/Y3OaZ+sUj4h8k4o+RlVWOp+u2sqbC9YxZcE6vti0G4Ce+Tmcf2JLzunckqZZqQGnFJFooKKPA+5O8YadvD5vHa/OXUPxhp0kGPTt0JTzT2zJgE4taZiu0zsi9ZWKPs64O0vW7+Bfn63l1blrWLlpN6lJCZzVsTmDerTmlKOb6kKuSD2joo9j7s5nJdt4ZXYJkz9bw9bd+8ltkMpF3VpxWag1Rzf7zwe+RCT+qOjriX3lFUxbXMqk2SVMW7yB8krnpPzGDOnZhh91bkla9XEhRCSuqOjroY079/HK7BJe/GQVn2/cRXZaEhd3b82VvdpQ0FxH+SLxRkVfj7k7M1ds5sVPvuSN+esoq6ikb4cmXNM3nzOPb66x+EXiRL0avVK+yczo06EJfTo0YfOuMsbP+pLnZ6zkhueKaNUonat6t2XwSXk0ztSAayLxSkf09VB5RSVvLVrPs9NXMmPFJtKSE7ikR2t+enJ72jXNDDqeiBwCnbqR77R43Xae+vBz/vHpGvZXVnLW8c0Zdmp7erRtrPH0RWKIil4OaMOOvfxt+kqem7mSbXv2071NI2467WjOOL6ZCl8kBqjopdZ2l5UzobCEMe+vYPXWPRzXogE3nX4053ZuqQu3IlFMRS8HbX9FJZPnrOGxd4tZXrqL/CYZ3HhaBy7u3ppkPXUrEnVU9HLIKiudNxeu49FpxcxfvZ3WjdO55YdHq/BFooyKXurM3Xl3SSkPvrWUuSXbyMtJ55bTC7ioeysVvkgUUNFLxLg705Zs4MGpy5i3ehttm2Rw+5kFXHBiK53DFwnQ9xW9DsXkoJgZPzyuOZNH9OPJa0JkpiRxx0uf8aOHP2DqwvVE24GDiKjo5RCZGWcc35x/3XIyjwzpRllFJdf/rZBBo6YzY/mmoOOJSDW1KnozG2BmS8ys2MzurmH5UDMrNbM54c911ZZVVJs/OZLhJXgJCcb5Jx7Fm3ecyu8v7syarXsZ8sRMhj79CUvW7Qg6nohQi3P0ZpYILAXOAkqAWcAQd19YbZ2hQMjdR9Sw/U53z6ptIJ2jj21791fw7PQvGDmtmJ37yrkslMedZx1DM73nVuSwqus5+p5AsbuvcPcyYDxwYSQDSvxIS07khh904L27Tmdo33ZMml3CD+5/lwenLmXXvvKg44nUS7Up+lbAqmrTJeF53zbIzOaa2UQzy6s2P83MCs1sppkNrOkvMLNh4XUKS0tLax1eolfjzBR+dX5H3rrzB/zwuGY8/PYyfviXd3lldgmVlbpgK3IkRepi7KtAvrt3AaYCz1Zb1jb8z4krgIfMrMO3N3b3Me4ecvdQbm5uhCJJNGjbJJORV3Zn0o19adEwnTtf/oyLRk1n9pdbgo4mUm/UpuhXA9WP0FuH533N3Te5+77w5FigR7Vlq8N/rgDeBbrVIa/EqB5tG/P3G/vyl0tPZO3WPVz82HRuH/8pa7ftCTqaSNyrTdHPAgrMrJ2ZpQCDgW/cPWNmLatNXgAsCs9vbGap4Z+bAv2AhUi9lJBgDOrRmmn/dRo3n96B1+ev44y/vMfo95ZTVl4ZdDyRuHXAonf3cmAEMIWqAn/Z3ReY2T1mdkF4tVvNbIGZfQbcCgwNzz8eKAzPnwb8ofrdOlI/ZaYmcdfZx/H2nT+gb4em/OHfiznn4ff5cNnGoKOJxCUNgSCBe2fxen4zeSFfbt7NuZ1b8stzj+eoRulBxxKJKRoCQaLaD49rzpt3nMqdZx3DW4vWc+YD7zH2gxWUV+h0jkgkqOglKqQlJ3LrGQW8decP6NUuh/teW8QFj37EnFVbg44mEvNU9BJV8nIyeGroSTx2ZXc27drHRY99xK/+OZ/te/cHHU0kZqnoJeqYGT/q3JK37vwB1/TJ57mZKznzL+/xxvy1QUcTiUkqeolaDdKS+c0FJ/DPm/vRNCuV4c/PZvhzRWzYvjfoaCIxRUUvUa9L60b8c0Q//nvAcUxbsoEzHniP8Z98qbHvRWpJRS8xITkxgRtP68Abt59Kx5bZ3P3KPK544mNWbtoVdDSRqKeil5jSrmkmL17fm99f3Jn5q7cx4KEPeOajzzVQmsj3UNFLzElIMIb0bMObd55Kr/Y5/ObVhQweM5MvNuroXqQmKnqJWS0bpvP00JO4/5IuLFq3nQEPv8+TH+roXuTbVPQS08yMS0N5TL3jB/Rp34R7/7WQwU/M5MtNu4OOJhI1VPQSF1o0TOOpr47u11Qd3Y/7WHfmiICKXuLIV0f3b9xxKl3zGvH//j6Pa5+ZxXrddy/1nIpe4k6rRuk8/9Ne/Ob8jsxYvon+D77Pv+auCTqWSGBU9BKXEhKMof3a8fptp5DfNJMR4z7lzpfmaMwcqZdU9BLXOuRmMXF4H247o4B/zFnNOQ99wCefbw46lsgRpaKXuJecmMAdZx3DhOF9SUwwBo+Zwf1TFuv1hVJvqOil3ujRtjGv33YKl/bIY+S05Vwyejqf6yErqQdU9FKvZKUm8cdLujD6qu6s3LSb8/76AROLSnQbpsQ1Fb3USwM6teTft53CCa0a8l8TPuO28bpQK/FLRS/11lGN0nnx+t787KxjeG3eWs796wfM/nJL0LFEIk5FL/VaYoJxyxkFvHxDb9zh0tEzGP3eco2XI3FFRS8C9Gibw2u3nsLZJzTnD/9ezNBnZrFx576gY4lEhIpeJKxhejIjr+jOfQM7MXPFJn708AfMWL4p6FgidVarojezAWa2xMyKzezuGpYPNbNSM5sT/lxXbdk1ZrYs/LkmkuFFIs3MuKp3W/5xUz+y0pK4cuxMHpy6lAqdypEYdsCiN7NEYCRwDtARGGJmHWtY9SV37xr+jA1vmwP8GugF9AR+bWaNI5Ze5DDpeFQ2r444mYHdWvHw28u4+qmPKd2hUzkSm2pzRN8TKHb3Fe5eBowHLqzl958NTHX3ze6+BZgKDDi0qCJHVmZqEg9c1pU/XdKFopVb+NFfdSpHYlNtir4VsKradEl43rcNMrO5ZjbRzPIOZlszG2ZmhWZWWFpaWsvoIkfGZaE8/nFzPxqET+WMnFasu3IkpkTqYuyrQL67d6HqqP3Zg9nY3ce4e8jdQ7m5uRGKJBI5x7XIZvKIkzm3y1HcP2UJ1z4ziy27yoKOJVIrtSn61UBetenW4Xlfc/dN7v7VCcyxQI/abisSK7JSk/jr4K7cO7ATM5Zv4rxHPuSzVVuDjiVyQLUp+llAgZm1M7MUYDAwufoKZtay2uQFwKLwz1OA/mbWOHwRtn94nkhMMjN+3LstE2/sA1Q9YPXCxys1Vo5EtQMWvbuXAyOoKuhFwMvuvsDM7jGzC8Kr3WpmC8zsM+BWYGh4283AvVT9spgF3BOeJxLTurRuxL9uOZneHZrwy7/P52cTPmNPWUXQsURqZNF2JBIKhbywsDDoGCK1UlHpPPLOMh5+exnHNm/A6Kt6kN80M+hYUg+ZWZG7h2papidjReogMcG4/cxjeHroSazbvpfzH/2QdxavDzqWyDeo6EUi4LRjm/HqiJPJa5zBT54p5KG3luoWTIkaKnqRCMnLyeCVm/pycfdWPPTWMq7/WyHb9miMewmeil4kgtKSE/nLpSdy74Un8N7SUi589EOWrNsRdCyp51T0IhFmZvy4Tz7jh/Vmd1kFFz32Ef+etzboWFKPqehFDpNQfg6v3nIyx7ZowI0vzObPU5bovL0EQkUvchg1z05j/LDeXB7K49FpxVyn8/YSABW9yGGWmpTIHwZ15t6BnXh/aSkXjfyI4g06by9Hjope5Aj4auiEcdf3Zvve/QwcOV3328sRo6IXOYJ6tsth8oiTyW+awU+fLWTUu8s1To4cdip6kSPsqEbpTLihL+d2bskf31jM7S/NYe9+jZMjh09S0AFE6qP0lEQeGdKN41tm8+c3l7CidBdjru5By4bpQUeTOKQjepGAmBk3n340T/w4xIrSnVzw6Ed8+uWWoGNJHFLRiwTszI7N+fvN/UhLTuDyMTP55xy9m0ciS0UvEgWOad6Af958Ml3zGnHb+DncP2WxHq6SiFHRi0SJnMwUnv9pLwaflMfIacu58YUidu0rDzqWxAEVvUgUSUlK4PcXd+ZX53Vk6sL1XDJ6Bmu27gk6lsQ4Fb1IlDEzfnJyO54aehIlm3dz4ciPmKOXkEsdqOhFotRpxzZj0k19SU1K4PLHZ/CvuWuCjiQxSkUvEsWqLtL2o3OrhowY9ymPvL1MT9LKQVPRi0S5JlmpvHB9Ly7q1oq/TF3KHXqSVg6SnowViQGpSYk8cNmJdMjN5M9vLqVkyx7GXB0iJzMl6GgSA3RELxIjzIwRPyzg0Su6MXf1Ni567COWl+4MOpbEABW9SIw5r8tRjB/Wm517y7lo5EdMX74x6EgS5WpV9GY2wMyWmFmxmd39PesNMjM3s1B4Ot/M9pjZnPBndKSCi9Rn3ds05h8396NZdhpXP/kJEwpXBR1JotgBi97MEoGRwDlAR2CImXWsYb0GwG3Ax99atNzdu4Y/wyOQWUSAvJwMJt3Yl97tm3DXxLn8ecoS3ZEjNarNEX1PoNjdV7h7GTAeuLCG9e4F/gjsjWA+EfkeDdOTefrakxh8UtU7aW9/aQ77ynVHjnxTbYq+FVD934Ul4XlfM7PuQJ67v1bD9u3M7FMze8/MTqnpLzCzYWZWaGaFpaWltc0uIkByYtWwCT8fcCz/nLOGH4/9hC27yoKOJVGkzhdjzSwBeAD4WQ2L1wJt3L0bcCcwzsyyv72Su49x95C7h3Jzc+saSaTeMTNuOu1oHhnSjTklW7l41HS+2Lgr6FgSJWpT9KuBvGrTrcPzvtIA6AS8a2ZfAL2ByWYWcvd97r4JwN2LgOXAMZEILiL/6fwTj2Lcdb3YuruMi0dNp2ilXmQitSv6WUCBmbUzsxRgMDD5q4Xuvs3dm7p7vrvnAzOBC9y90MxywxdzMbP2QAGwIuJ7ISJfC+Xn8MpN/chOS+KKJ2by73lrg44kATtg0bt7OTACmAIsAl529wVmdo+ZXXCAzU8F5prZHGAiMNzdN9cxs4gcQLummUy6sS8dj8rmpnGzefLDz4OOJAGyaLsdKxQKeWFhYdAxROLC3v0V3D5+Dm8sWMe1/fL5n3M7kphgQceSw8DMitw9VNMyPRkrEsfSkhMZeWV3ftKvHU9/9AU3vVCkAdHqIRW9SJxLTDB+dX5HfnVeR95cuJ4hT8xks26/rFdU9CL1xE9ObseoK7uzcM12Bo2azpebdgcdSY4QFb1IPTKgU0teuK4XW3aXcfGoj/hMryisF1T0IvVMKD+HSTf2JS05kcFjZvLO4vVBR5LDTEUvUg91yM3ilZv6cnSzLK57tpAXP/ky6EhyGKnoReqpZg3SGD+sN6cek8svXpnHg1OXavTLOKWiF6nHMlOTeOLqEJeFWvPw28u4e9I8yisqg44lEaZ3xorUc8mJCfxxUBdaZKfx13eKKd25j0ev6EZGiuohXuiIXkQwM+7sfyy/vagT7y7ZwJAxM9m0c1/QsSRCVPQi8rUre7Vl9FU9WLxuh+61jyMqehH5hv4ntGDc9b3Zumc/F4+azvzV24KOJHWkoheR/9CjbWMmDu9LalIClz8+gw+XbQw6ktSBil5EanR0s6p77fNyMrj2mU/455zVB95IopKKXkS+U/PsNF66oQ/d2zTmtvFzGPuB3hsUi1T0IvK9GqYn8+xPevKjzi2477VF/P71RVRW6sGqWKIbZUXkgNKSE3lkSHeaZi3g8fdXULpjH3+8pAvJiTpWjAUqehGplcQE438vOIHcrFT+MnUpm3eX8diV3fVgVQzQr2MRqTUz45YzCvjDxZ15f2kpQ574WC8xiQEqehE5aIN7tql6sGrtdi4ZPZ2SLXqwKpqp6EXkkPQ/oQXP/bQXG3fsY9Co6SxZtyPoSPIdVPQicsh6tsvh5eF9ALh09HRmfbE54ERSExW9iNTJcS2ymTi8L02zUrlq7Me8tVBvrIo2KnoRqbO8nAwmDO/DsS0acMPzRbxcuCroSFJNrYrezAaY2RIzKzazu79nvUFm5mYWqjbvF+HtlpjZ2ZEILSLRp0lWKuOu702f9k34+cS5jH5vedCRJOyARW9micBI4BygIzDEzDrWsF4D4Dbg42rzOgKDgROAAcBj4e8TkTiUlZrEU0NP4rwuLfnDvxfzOz1FGxVqc0TfEyh29xXuXgaMBy6sYb17gT8Ce6vNuxAY7+773P1zoDj8fSISp1KSEnh4cDeu7tOWMe+v4K6Jc9mv1xMGqjZF3wqofsKtJDzva2bWHchz99cOdtvw9sPMrNDMCktLS2sVXESi11dP0d5x5jFMml3C8OeK2FNWEXSseqvOF2PNLAF4APjZoX6Hu49x95C7h3Jzc+saSUSigJlx25kF3DuwE+8s2cDVT33Mtt37g45VL9Wm6FcDedWmW4fnfaUB0Al418y+AHoDk8MXZA+0rYjEuR/3bsujQ7ozZ9VWLh8zgw3b9x54I4mo2hT9LKDAzNqZWQpVF1cnf7XQ3be5e1N3z3f3fGAmcIG7F4bXG2xmqWbWDigAPon4XohIVDu3S0ueubYnqzbvZtDo6XyxcVfQkeqVAxa9u5cDI4ApwCLgZXdfYGb3mNkFB9h2AfAysBB4A7jZ3XWiTqQe6nd0U8Zd35ude8u5ZPQMFqzRu2iPFHOPrlufQqGQFxYWBh1DRA6T4g07+fGTH7NzbzljrwnRq32ToCPFBTMrcvdQTcv0ZKyIHFFHN8ti0o19aZadytVPfcJUDZlw2KnoReSIO6pROhOG9+W4Fg0Y/nwRE4tKgo4U11T0IhKInMwUXggPmfBfEz7Ti8cPIxW9iAQmKzWJJ4eGvn7x+P1TFhNt1w3jgV72KCKBSk2qevF4w/T5jJy2nM279nPfwE4kJljQ0eKGil5EApeYYPzuok7kZCYzctpytu0p48HLu5KapDEQI0FFLyJRwcy46+zjaJyRwn2vLWLH3kJGX9WDzFTVVF3pHL2IRJXrTmnP/Zd0YfryTVwx9mO27CoLOlLMU9GLSNS5NJTHqCu7s2jtdi57fAbrtml8nLpQ0YtIVOp/QguevbYna7ftZdCo6awo3Rl0pJiloheRqNWnQxNevL43e/ZXcOnoGcxfrfFxDoWKXkSiWufWDZkwvA+pSQkMGTOTj1dsCjpSzFHRi0jU65CbxcRq4+O8vUjj4xwMFb2IxISvxsc5tkUDhj1XxD8+1TuMaktFLyIxIyczhXHX96Znfg63vzSHZz76POhIMUFFLyIxJSs1iaevPYn+HZvzm1cX8uDUpRof5wBU9CISc9KSE3nsyu5c0qM1D7+9jP99dSGVlSr776Jni0UkJiUlJvCnQV1olJ7M2A8/Z9ue/fzpki4kJ+r49dtU9CISsxISjF+eezyNM1O4f8oStu/Zz8gru5OWrMHQqtOvPhGJaWbGzacfzX0DO/HOkg1c/eQnbN+7P+hYUUVFLyJx4arebfnr4G7M/nILgx+fSemOfUFHihoqehGJG+efeBRjrwmxYuNOLnt8Bqs27w46UlRQ0YtIXDnt2Ga8cF0vNu3cx6WjZ7Bs/Y6gIwVORS8icadH2xxeuqEPFe5c+vgM5qzaGnSkQNWq6M1sgJktMbNiM7u7huXDzWyemc0xsw/NrGN4fr6Z7QnPn2NmoyO9AyIiNTm+ZTaThvclOy2ZK56YyUfFG4OOFJgDFr2ZJQIjgXOAjsCQr4q8mnHu3tnduwJ/Ah6otmy5u3cNf4ZHKLeIyAG1aZLBxOF9aJOTwbVPz+KN+WuDjhSI2hzR9wSK3X2Fu5cB44ELq6/g7turTWYCekRNRKJCs+w0XhrWh06tsrnphdm8PGtV0JGOuNoUfSug+v8yJeF532BmN5vZcqqO6G+ttqidmX1qZu+Z2Sk1/QVmNszMCs2ssLS09CDii4gcWMOMZJ6/rhcnF+Ty80lzGfP+8qAjHVERuxjr7iPdvQPw38D/hGevBdq4ezfgTmCcmWXXsO0Ydw+5eyg3NzdSkUREvpaRksTYq0Oc16Ulv3t9MX98Y3G9GQytNkMgrAbyqk23Ds/7LuOBUQDuvg/YF/65KHzEfwxQeEhpRUTqICUpgYcHdyM7PZlR7y5n6+4y7hvYmcQECzraYVWbop8FFJhZO6oKfjBwRfUVzKzA3ZeFJ88FloXn5wKb3b3CzNoDBcCKSIUXETlYiQnGbwd2onFGMiOnLWfbnv08eHlXUpPid3ycAxa9u5eb2QhgCpAIPOXuC8zsHqDQ3ScDI8zsTGA/sAW4Jrz5qcA9ZrYfqASGu/vmw7EjIiK1ZWbcdfZxNM5I4b7XFrFjbyGjr+pBZmp8jvNo0XaOKhQKeWGhzuyIyJExoXAVd78yj06tGvLM0JNonJkSdKRDYmZF7h6qaZmejBWReu3SUB6jruzOorXbuezxGazbtjfoSBGnoheReq//CS149tqerN22l0GjprOidGfQkSJKRS8iAvTp0ITxw3qzd38Fl46ewfzV24KOFDEqehGRsE6tGjJheB/SkhMZPGYmM5ZvCjpSRKjoRUSqaZ+bxcQb+9CiYRrXPP0Jby5YF3SkOlPRi4h8S8uG6Uy4oQ/Ht8zmxhdmM6EwtsfHUdGLiNSgcWYK467rRZ/2Tbhr4lyeeD92n/VU0YuIfIfM1CSeHBriR51b8NvXF8Xs+Djx+RiYiEiEpCYl8siQ7jTKmM+od5ezZVcZ9w3sRFJi7Bwnq+hFRA7gq/FxmmSm8Mg7xWzZXcbDg7uRlhwb4+PEzq8kEZEAmRk/638svz6/I1MWrOfap2exY+/+oGPViopeROQgXNuvHQ9d3pVZX2xmyBMz2bhzX9CRDkhFLyJykAZ2a8UTV4co3rCTS0fPYNXm3UFH+l4qehGRQ3D6cc144bpebNq5j0GjprNk3Y6gI30nFb2IyCHq0TaHCcP7YgaXjp5O4RfR+boNFb2ISB0c26IBE4f3pUlWKlc9+THvLF4fdKT/oKIXEamjvJwMJgzvQ0GzBlz/tyImFZUEHekbVPQiIhHQNCuVF4f1pnf7HH424bOoGjJBRS8iEiFZqUk8NfQkzu3Skt++vojfv74oKoZM0JOxIiIRlJqUyF8HdyMnI4XH31/Bpl1l/OHizoEOmaCiFxGJsMQE454LT6BJVgoPvbWMLbvKePSK7qSnBDNkgk7diIgcBmbG7Wcew30DO/HOkg1c9eTHbN1dFkgWFb2IyGF0Ve+2PHZFd+aVbOPS0TNYs3XPEc+gohcROczO6dySZ3/Sk3Xb9jJo1HSKNxzZp2hrVfRmNsDMlphZsZndXcPy4WY2z8zmmNmHZtax2rJfhLdbYmZnRzK8iEis6NOhCS/d0IfySueS0TMoWrnliP3dByx6M0sERgLnAB2BIdWLPGycu3d2967An4AHwtt2BAYDJwADgMfC3yciUu90PCqbScP70ig9mSvHzuTtRUfmKdraHNH3BIrdfYW7lwHjgQurr+Du26tNZgJf3Th6ITDe3fe5++dAcfj7RETqpTZNMph4Y18KmjVg2HNFvHwEXjxem6JvBVRPUhKe9w1mdrOZLafqiP7Wg9lWRKQ+aZqVyvhhvenboQk/nziXkdOKD+uDVRG7GOvuI929A/DfwP8czLZmNszMCs2ssLS0NFKRRESiVmZqEk9ecxIDux7F/VOW8JvJC6ioPDxlX5uiXw3kVZtuHZ73XcYDAw9mW3cf4+4hdw/l5ubWIpKISOxLSUrggcu6cv0p7Xh2xkpuffHTw1L2tXkydhZQYGbtqCrpwcAV1VcwswJ3XxaePBf46ufJwDgzewA4CigAPolEcBGReJCQYPzy3I40a5DG9r37SUywiP8dByx6dy83sxHAFCAReMrdF5jZPUChu08GRpjZmcB+YAtwTXjbBWb2MrAQKAdudveKiO+FiEiMu/7U9oftuy0aRlarLhQKeWFhYdAxRERiipkVuXuopmV6MlZEJM6p6EVE4pyKXkQkzqnoRUTinIpeRCTOqehFROKcil5EJM5F3X30ZlYKrKzDVzQFNkYoTtDiaV8gvvYnnvYFtD/RrLb70tbdaxxDJuqKvq7MrPC7HhqINfG0LxBf+xNP+wLan2gWiX3RqRsRkTinohcRiXPxWPRjgg4QQfG0LxBf+xNP+wLan2hW532Ju3P0IiLyTfF4RC8iItWo6EVE4lzcFL2ZDTCzJWZWbGZ3B52nLszsKTPbYGbzg85SV2aWZ2bTzGyhmS0ws9uCzlQXZpZmZp+Y2Wfh/fnfoDPVlZklmtmnZvavoLPUlZl9YWbzzGyOmcX8iy3MrJGZTTSzxWa2yMz6HNL3xMM5ejNLBJYCZwElVL3+cIi7Lww02CEys1OBncDf3L1T0HnqwsxaAi3dfbaZNQCKgIEx/P+NAZnuvtPMkoEPgdvcfWbA0Q6Zmd0JhIBsdz8v6Dx1YWZfACF3j4uHpczsWeADdx9rZilAhrtvPdjviZcj+p5AsbuvcPcyql5QfmHAmQ6Zu78PbA46RyS4+1p3nx3+eQewCGgVbKpD51V2hieTw5+YPVoys9ZUved5bNBZ5JvMrCFwKvAkgLuXHUrJQ/wUfStgVbXpEmK4TOKVmeUD3YCPA45SJ+FTHXOADcBUd4/l/XkI+DlQGXCOSHHgTTMrMrNhQYepo3ZAKfB0+NTaWDPLPJQvipeilyhnZlnAJOB2d98edJ66cPcKd+8KtAZ6mllMnl4zs/OADe5eFHSWCDrZ3bsD5wA3h0+DxqokoDswyt27AbuAQ7r+GC9FvxrIqzbdOjxPokD4XPYk4AV3fyXoPJES/mf0NGBAwFEOVT/ggvB57fHAD83s+WAj1Y27rw7/uQH4O1WndWNVCVBS7V+ME6kq/oMWL0U/Cygws3bhCxaDgckBZxK+vnj5JLDI3R8IOk9dmVmumTUK/5xO1Q0AiwMNdYjc/Rfu3trd86n6b+Ydd78q4FiHzMwywxf8CZ/i6A/E7J1r7r4OWGVmx4ZnnQEc0k0MSRFLFSB3LzezEcAUIBF4yt0XBBzrkJnZi8BpQFMzKwF+7e5PBpvqkPUDfgzMC5/XBvh/7v56cJHqpCXwbPhOrwTgZXeP+dsS40Rz4O9VxxYkAePc/Y1gI9XZLcAL4QPYFcC1h/IlcXF7pYiIfLd4OXUjIiLfQUUvIhLnVPQiInFORS8iEudU9CIicU5FLyIS51T0IiJx7v8DUWr/VeVn0EQAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "data = wineset3()  # 创建第3批数据集\n",
    "# print(data)\n",
    "cumulativegraph(data,(1,1),120)   #绘制累积概率密度\n",
    "probabilitygraph(data,(1,1),6)    #绘制概率密度图"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42839a7e-71a5-47f6-a8db-23537b888b04",
   "metadata": {},
   "source": [
    "# k近邻算法中k的选取以及特征归一化的重要性\n",
    "\n",
    "\n",
    "如果我们选取较小的k值，那么就会意味着我们的整体模型会变得复杂，容易发生过拟合。因为k太小会导致过拟合，很容易将一些噪声（如上图离五边形很近的黑色圆点）学习到模型中，而忽略了数据真实的分布！\n",
    "\n",
    "如果k值太大就会造成模型过于简单，完全忽略训练数据实例中的大量有用信息，是不可取的。\n",
    "\n",
    "那么我们一般怎么选取呢？李航博士书上讲到，我们一般选取一个较小的数值，通常采取 交叉验证法来选取最优的k值。（也就是说，选取k值很重要的关键是实验调参，类似于神经网络选取多少层这种，通过调整超参数来得到一个较好的结果）\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54a55476-a758-4751-ab61-fefefb207523",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  },
  "vscode": {
   "interpreter": {
    "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
